Skip to content

Conversation

@felipemello1
Copy link
Contributor

@felipemello1 felipemello1 commented Oct 30, 2025

This PR attempts to showcase the different styles we could use for our config system. A push here is if we can leave the .yaml and use .py instead, under the reasoning that "config is code".

Python can enable a) better type safety, b) import directly instead of using strings, and c) more flexibility for the user to define custom code within the config.

On the other hand, yaml can be easier to read and hydra provides some nice freebies.

Must have:

  • Lazy instantiation: Currently we don't have any way to instantiate a function through configs, e.g. 'loss: my_loss_fn'.
  • Handle complex patterns, e.g. nesting

Nice to have:

  • Composition: Our code is naturally fragmented (generator, rewards, etc). Touching one config without touching the others is a relevant feature.
  • Ability to code in the config, e.g. if statements, sum values, check conditions, etc.
  • CLI overrides

Proposed options:

  1. OmegaConfg / Hydra
  2. Fiddle (cfg option by google)
  3. Factory + dataclasses
  4. Plain python with partials
  5. Plain python with dictionary
    TODO: toml

I would like to hear what others think about the options, if we should stick with yaml and fully explore hydra/omegaconf or change to .py system.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 30, 2025
This PR showcases different styles for our config system, exploring
the "config is code" philosophy.

Implements 5 approaches:
1. OmegaConf/Hydra (YAML-based)
2. Fiddle (Google's config library)
3. Factory + dataclasses
4. Plain Python with functools.partial
5. Plain Python with dictionaries

Each approach demonstrates handling of:
- Lazy instantiation
- Nested component instantiation
- Partial application for runtime args
- Config composition and overrides
Copy link

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting up these candidates! Left some initial impressions, might not be accurate.

Comment on lines +91 to +93
inner_partial = cfg2["model"].keywords["attn_config"]
inner_partial.keywords["num_heads"] = 64
model2 = cfg2["model"]()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definition part is OK, but its consumption is confusing when it comes to nested init

return base

cfg_variant = llama3_2_1b_large_lr()
attn_config_variant = cfg_variant["model"]["kwargs"]["attn_config"]["cls"](
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems too flexible, hard to read, and error-prone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, its horrible. I put it here to make a contrast

# LICENSE file in the root directory of this source tree.

"""
Dataclass config with inner Config classes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@felipemello1 felipemello1 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that looks nice, didnt know about it

Comment on lines +67 to +69
cfg2.optimizer.lr = 1e-4
optimizer_partial2 = instantiate(cfg2.optimizer)
optimizer2 = optimizer_partial2(params=model2.parameters())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit confusing.

Copy link
Contributor Author

@felipemello1 felipemello1 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, perhaps there is a better way? Not sure if i can pass the params already in instantiate.

The spirit here is that i can actually do once:
instantiate(cfg), and everything gets instantiated already at the very start, but when i call cfg.optimizer, i will get a partial.

In this example it feels weird because instead of doing `instantiate(config)```, i am instantiating each arg of the config individually

- Lazy instantiation via hydra.utils.instantiate
- Command-line override for free (--optimizer.lr=1e-4)

Cons:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus learning curve, compose, instantiate, etc.

Copy link
Contributor Author

@felipemello1 felipemello1 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in practice, i think that people would only use instantiate, and we can even get rid of this and instantiate the whole config once at the start. Partials can then be called where needed. e.g.

def main(path_cfg:str):
	cfg = instantiate(load_cfg(path_cfg))
	
    model = cfg.model
	optimizer = cfg.optimizer(params=model.param)

def llama3_2_1b_full():
output_dir = "/tmp/torchtune/llama3_2_1B/full"

return {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one and the fiddle one, I guess you can take this to extreme and make everything a (data)class, e.g. TrainerWithConfig class can have model, optimizer, data loader, etc. What's your thought?

Copy link
Contributor Author

@felipemello1 felipemello1 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue with this pattern of TrainerWithConfig.config() is that its too opinionated and impacts the entire codebase. 3rd party utilities, or local code that dont need to be a class, must be represented as such.

Between the two, i personally prefer fiddle.

But when i read config_fiddle.py and i read baseline.yaml, to my eyes, baseline.yaml is easier to parse and understand. Perhaps because its a simple config? I can imagine cases where using python can be handy.

On the composability side, if you look at baseline_different_bsz.yaml, it seems easier to abstract away experimentation from infra args.

TDLR

  1. despite all of the hate, IMO .yaml + OmegaConf and/or Hydra is the lesser of all evils.
  2. If we want to use .py, fiddle seems the easiest
  3. dataclasses.config pattern are the safest, but impact the entire code base and is harder to read

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants